Planning with Hidden Parameter Polynomial MDPs
نویسندگان
چکیده
For many applications of Markov Decision Processes (MDPs), the transition function cannot be specified exactly. Bayes-Adaptive MDPs (BAMDPs) extend to consider probabilities governed by latent parameters. To act optimally in BAMDPs, one must maintain a belief distribution over Typically, this is described set sample (particle) MDPs, and associated weights which represent likelihood MDP being true underlying MDP. However, as number dimensions parameter space increases, required sufficiently grows exponentially. Thus, maintaining an accurate form complex spaces computationally intensive, turn affects performance planning for these models. In paper, we propose alternative approach We class BAMDPs where can expressed closed polynomial parameters, outline method closed-form parameters results representation. Furthermore, representation does away with need tune belief. evaluate two domains empirically show that polynomial, closed-form, better plans than sampling-based
منابع مشابه
Multiagent Planning with Factored MDPs
We present a principled and efficient planning algorithm for cooperative multiagent dynamic systems. A striking feature of our method is that the coordination and communication between the agents is not imposed, but derived directly from the system dynamics and function approximation architecture. We view the entire multiagent system as a single, large Markov decision process (MDP), which we as...
متن کاملDiscovering hidden structure in factored MDPs
Markov Decision Processes (MDPs) describe a wide variety of planning scenarios ranging from military operations planning to controlling a Mars rover. However, today’s solution techniques scale poorly, limiting MDPs’ practical applicability. In this work, we propose algorithms that automatically discover and exploit the hidden structure of factored MDPs. Doing so helps solve MDPs faster and with...
متن کاملPolynomial Value Iteration Algorithms for Detrerminstic MDPs
Value iteration is a commonly used and em pirically competitive method in solving many Markov decision process problems. However, it is known that value iteration has only pseudo polynomial complexity in general. We estab lish a somewhat surprising polynomial bound for value iteration on deterministic Markov decision (DMDP) problems. We show that the basic value iteration procedure converges...
متن کاملPolynomial Value Iteration Algorithms for Deterministic MDPs
Value iteration is a commonly used and empirically competitive method in solving many Markov decision process problems. However, it is known that value iteration has only pseudopolynomial complexity in general. We establish a somewhat surprising polynomial bound for value iteration on deterministic Markov decision (DMDP) problems. We show that the basic value iteration procedure converges to th...
متن کاملProbabilistic hierarchical planning over MDPs
In this paper, we propose a new approach to using probabilistic hierarchical task networks (HTNs) as an effective method for agents to plan in conditions in which their problem-solving knowledge is uncertain, and the environment is non-deterministic. In such situations it is natural to model the environment as a Markov decision process (MDP). We show that using Earley graphs, it is possible to ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i10.26411